fix: capture DashScope multimodal media outputs by sipercai · Pull Request #227 · alibaba/loongsuite-python

sipercai · 2026-06-23T06:38:25Z

Description

This PR updates DashScope MultiModalConversation output parsing so media URLs returned in response content are captured as output URI parts. Image and video content items now become Uri parts in gen_ai.output.messages, matching the existing text/audio handling and allowing downstream multimodal processing to see generated media outputs.

Fixes # (N/A)

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

git diff --check
python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-dashscope/tests/test_multimodal_conversation.py -q
python -m pytest instrumentation-loongsuite/loongsuite-instrumentation-dashscope/tests/test_image_synthesis.py -q
python -m ruff check instrumentation-loongsuite/loongsuite-instrumentation-dashscope
python "$PIPELINE_SKILL_DIR/scripts/check_loongsuite_pr_readiness.py" --repo .
tox -e precommit
tox -c tox-loongsuite.ini -e py313-test-loongsuite-instrumentation-dashscope-oldest,py313-test-loongsuite-instrumentation-dashscope-latest

Validation Evidence

Spec and Scope

Linked issue/spec: N/A; local bug report for DashScope MultiModalConversation image output capture.
Approved spec/comment: User requested direct implementation for response.output.choices[0].message.content = [{"image": "..."}].
Changed surface: loongsuite-instrumentation-dashscope output message extraction and tests.

Local Checks

Check	Command	Result	Notes
Static readiness	`python "$PIPELINE_SKILL_DIR/scripts/check_loongsuite_pr_readiness.py" --repo .`	pass	LoongSuite static readiness checks passed.
Precommit	`tox -e precommit`	pass	Completed repository precommit gate.
Focused tests	`tox -c tox-loongsuite.ini -e py313-test-loongsuite-instrumentation-dashscope-oldest,py313-test-loongsuite-instrumentation-dashscope-latest`	pass	Both environments reported 51 passed, 4 skipped.
Focused lint	`tox -c tox-loongsuite.ini -e lint-loongsuite-instrumentation-dashscope`	blocked	Environment dependency download stalled; direct `python -m ruff check instrumentation-loongsuite/loongsuite-instrumentation-dashscope` passed.
Claude review	Local Codex-Claude review loop	pass	No blocking findings remained after review/fix/re-review.
Privacy scan	scan changed files for local paths, bearer tokens, API keys, and secret-looking keys	pass	No hits in changed files.

Real E2E Matrix

Scenario	Status	Command or Demo	Evidence
non-streaming	pass	Live DashScope `MultiModalConversation.call(model="wan2.7-image")` smoke	Real response returned image content and local span contained `gen_ai.output.messages` with `modality=image` URI.
streaming	pass	`tox -c tox-loongsuite.ini -e py313-test-loongsuite-instrumentation-dashscope-oldest,py313-test-loongsuite-instrumentation-dashscope-latest`	Existing streaming multimodal tests passed in both focused environments.
concurrency	blocked	Not run for this narrow output-parser fix	Run a bounded two-call DashScope smoke before marking ready if concurrency evidence is required.
agent/tool/ReAct	N/A	DashScope SDK media output parser has no agent/tool surface	Not an agent framework integration.
tool-heavy	N/A	DashScope SDK media output parser has no tool-calling surface	Not a tool orchestration integration.
error path	pass	Focused DashScope tox suites	Existing error-handling tests passed in both focused environments.

Telemetry and Weaver

Check	Status	Command or Artifact	Notes
Span tree / span kinds	pass	Live DashScope smoke plus ARMS readback	ARMS readback confirmed a `chat wan2.7-image` LLM span with image URI in `gen_ai.output.messages`.
Content capture modes	pass	Focused unit test with `SPAN_ONLY`; existing no-content tests in DashScope suite	New test asserts image URI is written when content capture is enabled; existing no-content tests continue to pass.
Concurrency isolation	blocked	Not run	Run a bounded concurrent smoke before ready-for-review if required.
Weaver live-check	blocked	`weaver registry live-check -r <loongsuite-semantic-conventions-registry> --advice-profile loongsuite-genai ...`	Not run in this draft preparation; ARMS readback was used for telemetry evidence.

CI

GitHub checks: pending PR creation.
Known unrelated failures: none identified.
Follow-up needed: rerun focused lint tox and Weaver live-check before moving this PR out of draft if maintainers require those gates.

Does This PR Require a Core Repo Change?

Yes. - Link to PR:
No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

Followed the style guidelines of this project
Changelogs have been updated
Unit tests have been added
Documentation has been updated

ralf0131

Summary

DashScope MultiModalConversation output parsing now captures image and video content items as Uri parts in gen_ai.output.messages, extending the existing text/audio handling to all media types returned by the API. The implementation correctly mirrors the input-side extraction pattern (text → image → audio → video) and the existing Uri construction used throughout the file.

Findings

No issues found. The new elif branches follow the established pattern exactly — same Uri(uri, modality, mime_type=None, type="uri") shape as the existing audio handling, consistent with both _extract_multimodal_input_messages and the image/video synthesis paths.

Test Coverage

Parametrized unit test covers image, audio, and video URI extraction (audio was previously untested at this level — nice bonus).
Mixed-content test (text + image) verifies part ordering is preserved.
End-to-end span attribute test validates the full pipeline from response parsing through to gen_ai.output.messages on the finished span.

Compatibility

Purely additive — new elif branches, no change to existing behavior or public API. Backward compatible.

Automated review by github-manager-bot

sipercai marked this pull request as ready for review June 23, 2026 06:39

fix: capture dashscope multimodal media outputs

8a17210

sipercai force-pushed the fix/dashscope-multimodal-output-uri branch from 265a89a to 8a17210 Compare June 23, 2026 07:16

ralf0131 approved these changes Jun 23, 2026

View reviewed changes

github-actions Bot assigned 123liuziming, Cirilla-zmh and ralf0131 Jun 23, 2026

github-actions Bot requested review from 123liuziming and Cirilla-zmh June 23, 2026 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: capture DashScope multimodal media outputs#227

fix: capture DashScope multimodal media outputs#227
sipercai wants to merge 1 commit into
mainfrom
fix/dashscope-multimodal-output-uri

sipercai commented Jun 23, 2026

Uh oh!

ralf0131 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

sipercai commented Jun 23, 2026

Description

Type of change

How Has This Been Tested?

Validation Evidence

Spec and Scope

Local Checks

Real E2E Matrix

Telemetry and Weaver

CI

Does This PR Require a Core Repo Change?

Checklist:

Uh oh!

ralf0131 left a comment

Choose a reason for hiding this comment

Summary

Findings

Test Coverage

Compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants